AITopics | readability metric

Collaborating Authors

readability metric

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Evaluating the Evaluators: Are readability metrics good measures of readability?

Cachola, Isabel, Khashabi, Daniel, Dredze, Mark

arXiv.org Artificial IntelligenceAug-27-2025

Plain Language Summarization (PLS) aims to distill complex documents into accessible summaries for non-expert audiences. In this paper, we conduct a thorough survey of PLS literature, and identify that the current standard practice for readability evaluation is to use traditional readability metrics, such as Flesch-Kincaid Grade Level (FKGL). However, despite proven utility in other fields, these metrics have not been compared to human readability judgments in PLS. We evaluate 8 readability metrics and show that most correlate poorly with human judgments, including the most popular metric, FKGL. We then show that Language Models (LMs) are better judges of readability, with the best-performing model achieving a Pearson correlation of 0.56 with human judgments. Extending our analysis to PLS datasets, which contain summaries aimed at non-expert audiences, we find that LMs better capture deeper measures of readability, such as required background knowledge, and lead to different conclusions than the traditional metrics. Based on these findings, we offer recommendations for best practices in the evaluation of plain language summaries. We release our analysis code and survey data.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.19221

Country:

Asia > Middle East > UAE (0.46)
North America > United States > Minnesota (0.28)

Genre:

Overview (1.00)
Research Report > New Finding (0.93)
Research Report > Experimental Study (0.68)

Industry:

Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.46)
Energy > Renewable (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)

Add feedback

Exploring the change in scientific readability following the release of ChatGPT

Alsudais, Abdulkareem

arXiv.org Artificial IntelligenceJun-30-2025

The rise and growing popularity of accessible large language models have raised questions about their impact on various aspects of life, including how scientists write and publish their research. The primary objective of this paper is to analyze a dataset consisting of all abstracts posted on arXiv.org between 2010 and June 7th, 2024, to assess the evolution of their readability and determine whether significant shifts occurred following the release of ChatGPT in November 2022 . Four standard readability for mulas are used to calculate individual readability scores for each paper, classifying their level of readability. These scores are then aggregated by year and across the eight primary categories covered by the platform. The results show a steady annual decrease in readability, suggesting that abstracts are likely becoming increasingly complex. Additionally, following the release of ChatGPT, a significant change in readability is observed for 2023 and the analyzed months of 2024. Similar trends are found acr oss categories, with most experiencing a notable change in readability during 2023 and 2024. These findings offer insights into the broader changes in readability and point to the likely influence of AI on scientific writing.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.joi.2025.101679

2506.21825

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring

Almasi, Mina, Kristensen-McLachlan, Ross Deans

arXiv.org Artificial IntelligenceJun-10-2025

This paper investigates the potentials of Large Language Models (LLMs) as adaptive tutors in the context of second-language learning. In particular, we evaluate whether system prompting can reliably constrain LLMs to generate only text appropriate to the student's competence level. We simulate full teacher-student dialogues in Spanish using instruction-tuned, open-source LLMs ranging in size from 7B to 12B parameters. Dialogues are generated by having an LLM alternate between tutor and student roles with separate chat histories. The output from the tutor model is then used to evaluate the effectiveness of CEFR-based prompting to control text difficulty across three proficiency levels (A1, B1, C1). Our findings suggest that while system prompting can be used to constrain model outputs, prompting alone is too brittle for sustained, long-term interactional contexts - a phenomenon we term alignment drift. Our results provide insights into the feasibility of LLMs for personalized, proficiency-aligned adaptive tutors and provide a scalable method for low-cost evaluation of model performance without human participants.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.08351

Country:

Europe (1.00)
Asia (0.93)
North America > Mexico (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Educational Setting (0.68)
Education > Curriculum > Subject-Specific Education (0.49)
Education > Educational Technology > Educational Software > Computer Based Training (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

The Use of Readability Metrics in Legal Text: A Systematic Literature Review

Han, Yu, Ceross, Aaron, Bergmann, Jeroen H. M.

arXiv.org Artificial IntelligenceNov-14-2024

Understanding the text in legal documents can be challenging due to their complex structure and the inclusion of domain-specific jargon. Laws and regulations are often crafted in such a manner that engagement with them requires formal training, potentially leading to vastly different interpretations of the same texts. Linguistic complexity is an important contributor to the difficulties experienced by readers. Simplifying texts could enhance comprehension across a broader audience, not just among trained professionals. Various metrics have been developed to measure document readability. Therefore, we adopted a systematic review approach to examine the linguistic and readability metrics currently employed for legal and regulatory texts. A total of 3566 initial papers were screened, with 34 relevant studies found and further assessed. Our primary objective was to identify which current metrics were applied for evaluating readability within the legal field. Sixteen different metrics were identified, with the Flesch-Kincaid Grade Level being the most frequently used method. The majority of studies (73.5%) were found in the domain of "informed consent forms". From the analysis, it is clear that not all legal domains are well represented in terms of readability metrics and that there is a further need to develop more consensus on which metrics should be applied for legal documents.

complexity, readability, readability metric, (14 more...)

arXiv.org Artificial Intelligence

2411.09497

Country:

North America > United States (0.93)
Oceania > New Zealand (0.04)
Oceania > Australia > Queensland (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Law > Statutes (0.69)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.67)

Add feedback

MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain

Jiang, Chao, Xu, Wei

arXiv.org Artificial IntelligenceMay-3-2024

Medical texts are notoriously challenging to read. Properly measuring their readability is the first step towards making them more accessible. In this paper, we present a systematic study on fine-grained readability measurements in the medical domain at both sentence-level and span-level. We introduce a new dataset MedReadMe, which consists of manually annotated readability ratings and fine-grained complex span annotation for 4,520 sentences, featuring two novel "Google-Easy" and "Google-Hard" categories. It supports our quantitative analysis, which covers 650 linguistic features and automatic complex word and jargon identification. Enabled by our high-quality annotation, we benchmark and improve several state-of-the-art sentence-level readability metrics for the medical domain specifically, which include unsupervised, supervised, and prompting-based methods using recently developed large language models (LLMs). Informed by our fine-grained complex span annotation, we find that adding a single feature, capturing the number of jargon spans, into existing readability formulas can significantly improve their correlation with human judgments. We will publicly release the dataset and code.

computational linguistic, proceedings, readability, (15 more...)

arXiv.org Artificial Intelligence

2405.02144

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
North America > Canada > Ontario > Toronto (0.04)
(19 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)

Add feedback

Know Your Audience: Do LLMs Adapt to Different Age and Education Levels?

Rooein, Donya, Curry, Amanda Cercas, Hovy, Dirk

arXiv.org Artificial IntelligenceDec-4-2023

Large language models (LLMs) offer a range of new possibilities, including adapting the text to different audiences and their reading needs. But how well do they adapt? We evaluate the readability of answers generated by four state-of-the-art LLMs (commercial and open-source) to science questions when prompted to target different age groups and education levels. To assess the adaptability of LLMs to diverse audiences, we compare the readability scores of the generated responses against the recommended comprehension level of each age and education group. We find large variations in the readability of the answers by different LLMs. Our results suggest LLM answers need to be better adapted to the intended audience demographics to be more comprehensible. They underline the importance of enhancing the adaptability of LLMs in education settings to cater to diverse age and education levels. Overall, current LLMs have set readability ranges and do not adapt well to different audiences, even when prompted. That limits their potential for educational purposes.

different type, education level, llm, (13 more...)

arXiv.org Artificial Intelligence

2312.02065

Country: Europe > Italy > Lombardy > Milan (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Education > Educational Setting > Higher Education (0.94)
Energy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

LC-Score: Reference-less estimation of Text Comprehension Difficulty

Tardy, Paul, Roze, Charlotte, Poupet, Paul

arXiv.org Artificial IntelligenceOct-5-2023

Being able to read and understand written text is critical in a digital era. However, studies shows that a large fraction of the population experiences comprehension issues. In this context, further initiatives in accessibility are required to improve the audience text comprehension. However, writers are hardly assisted nor encouraged to produce easy-to-understand content. Moreover, Automatic Text Simplification (ATS) model development suffers from the lack of metric to accurately estimate comprehension difficulty We present \textsc{LC-Score}, a simple approach for training text comprehension metric for any French text without reference \ie predicting how easy to understand a given text is on a $[0, 100]$ scale. Our objective with this scale is to quantitatively capture the extend to which a text suits to the \textit{Langage Clair} (LC, \textit{Clear Language}) guidelines, a French initiative closely related to English Plain Language. We explore two approaches: (i) using linguistically motivated indicators used to train statistical models, and (ii) neural learning directly from text leveraging pre-trained language models. We introduce a simple proxy task for comprehension difficulty training as a classification task. To evaluate our models, we run two distinct human annotation experiments, and find that both approaches (indicator based and neural) outperforms commonly used readability and comprehension metrics such as FKGL.

annotation, experiment, indicator, (13 more...)

arXiv.org Artificial Intelligence

2310.02754

Country:

Europe > France (0.05)
North America > United States (0.04)
Europe > Iceland > Capital Region > Reykjavik (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

TextDescriptives: A Python package for calculating a large variety of metrics from text

Hansen, Lasse, Olsen, Ludvig Renbo, Enevoldsen, Kenneth

arXiv.org Artificial IntelligenceMar-28-2023

TextDescriptives is a Python package for calculating a large variety of metrics from text. It is built on top of spaCy and can be easily integrated into existing workflows. The package has already been used for analysing the linguistic stability of clinical texts, creating features for predicting neuropsychiatric conditions, and analysing linguistic goals of primary school students. This paper describes the package and its features.

metric, proportion, textdescriptive, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.21105/joss.05153

2301.02057

Country:

Europe > Denmark > Central Jutland > Aarhus (0.06)
Asia > Middle East > Jordan (0.05)

Genre: Research Report (0.41)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.72)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback